{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/master/tutorials/W1D3_ModelFitting/student/W1D3_Tutorial4.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Neuromatch Academy: Week 1, Day 3, Tutorial 4\n",
    "# Model Fitting: Multiple linear regression and polynomial regression\n",
    "\n",
    "**Content creators**: Pierre-Étienne Fiquet, Anqi Wu, Alex Hyafil with help from Byron Galbraith, Ella Batty\n",
    "\n",
    "**Content reviewers**: Lina Teichmann, Saeed Salehi, Patrick Mineault, Michael Waskom"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "#Tutorial Objectives\n",
    "\n",
    "This is Tutorial 4 of a series on fitting models to data. We start with simple linear regression, using least squares optimization (Tutorial 1) and Maximum Likelihood Estimation (Tutorial 2). We will use bootstrapping to build confidence intervals around the inferred linear model parameters (Tutorial 3). We'll finish our exploration of regression models by generalizing to multiple linear regression and polynomial regression (Tutorial 4). We end by learning how to choose between these various models. We discuss the bias-variance trade-off (Tutorial 5) and Cross Validation for model selection (Tutorial 6).\n",
    "\n",
    "In this tutorial, we will generalize the regression model to incorporate multiple features.\n",
    "- Learn how to structure inputs for regression using the 'Design Matrix'\n",
    "- Generalize the MSE for multiple features using the ordinary least squares estimator\n",
    "- Visualize data and model fit in multiple dimensions\n",
    "- Fit polynomial regression models of different complexity\n",
    "- Plot and evaluate the polynomial regression fits\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 537
    },
    "colab_type": "code",
    "outputId": "e9e292d7-b8a3-4b1b-90f8-cfb3fe193b58"
   },
   "outputs": [],
   "source": [
    "#@title Video 1: Multiple Linear Regression and Polynomial Regression\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id=\"d4nfTki6Ejc\", width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Figure Settings\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Helper Functions\n",
    "\n",
    "def plot_fitted_polynomials(x, y, theta_hat):\n",
    "  \"\"\" Plot polynomials of different orders\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "    theta_hat (dict): polynomial regression weights for different orders\n",
    "  \"\"\"\n",
    "\n",
    "  x_grid = np.linspace(x.min() - .5, x.max() + .5)\n",
    "\n",
    "  plt.figure()\n",
    "\n",
    "  for order in range(0, max_order + 1):\n",
    "    X_design = make_design_matrix(x_grid, order)\n",
    "    plt.plot(x_grid, X_design @ theta_hat[order]);\n",
    "\n",
    "  plt.ylabel('y')\n",
    "  plt.xlabel('x')\n",
    "  plt.plot(x, y, 'C0.');\n",
    "  plt.legend([f'order {o}' for o in range(max_order + 1)], loc=1)\n",
    "  plt.title('polynomial fits')\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Section 1: Multiple Linear Regression\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have considered the univariate case and how to produce confidence intervals for our estimator, we turn to the general linear regression case, where we can have more than one regressor, or feature, in our input.\n",
    "\n",
    "Recall that our original univariate linear model was given as\n",
    "\n",
    "\\begin{align}\n",
    "y = \\theta x + \\epsilon\n",
    "\\end{align}\n",
    "\n",
    "where $\\theta$ is the slope and $\\epsilon$ some noise. We can easily extend this to the multivariate scenario by adding another parameter for each additional feature\n",
    "\n",
    "\\begin{align}\n",
    "y = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + ... +\\theta_d x_d + \\epsilon\n",
    "\\end{align}\n",
    "\n",
    "where $\\theta_0$ is the intercept and $d$ is the number of features (it is also the dimensionality of our input).\n",
    "\n",
    "We can condense this succinctly using vector notation for a single data point\n",
    "\n",
    "\\begin{align}\n",
    "y_i = \\boldsymbol{\\theta}^{\\top}\\mathbf{x}_i + \\epsilon\n",
    "\\end{align}\n",
    "\n",
    "and fully in matrix form\n",
    "\n",
    "\\begin{align}\n",
    "\\mathbf{y} = \\mathbf{X}\\boldsymbol{\\theta} + \\mathbf{\\epsilon}\n",
    "\\end{align}\n",
    "\n",
    "where $\\mathbf{y}$ is a vector of measurements, $\\mathbf{X}$ is a matrix containing the feature values (columns) for each input sample (rows), and $\\boldsymbol{\\theta}$ is our parameter vector.\n",
    "\n",
    "This matrix $\\mathbf{X}$ is often referred to as the \"[design matrix](https://en.wikipedia.org/wiki/Design_matrix)\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "For this tutorial we will focus on the two-dimensional case ($d=2$), which allows us to easily visualize our results. As an example, think of a situation where a scientist records the spiking response of a retinal ganglion cell to patterns of light signals that vary in contrast and in orientation. Then contrast and orientation values can be used as features / regressors to predict the cells response.\n",
    "\n",
    "In this case our model can be writen as:\n",
    "\n",
    "\\begin{align}\n",
    "y = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\epsilon\n",
    "\\end{align}\n",
    "\n",
    "or in matrix form where\n",
    "\n",
    "\\begin{align}\n",
    "\\mathbf{X} = \n",
    "\\begin{bmatrix}\n",
    "1 & x_{1,1} & x_{1,2} \\\\\n",
    "1 & x_{2,1} & x_{2,2} \\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "1 & x_{n,1} & x_{n,2}\n",
    "\\end{bmatrix}, \n",
    "\\boldsymbol{\\theta} =\n",
    "\\begin{bmatrix}\n",
    "\\theta_0 \\\\\n",
    "\\theta_1 \\\\\n",
    "\\theta_2 \\\\\n",
    "\\end{bmatrix}\n",
    "\\end{align}\n",
    "\n",
    "For our actual exploration dataset we shall set $\\boldsymbol{\\theta}=[0, -2, -3]$ and draw $N=40$ noisy samples from $x \\in [-2,2)$. Note that setting the value of $\\theta_0 = 0$ effectively ignores the offset term."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "0a93eff1-1dbe-4bc9-9cea-7604faf9493e"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "#@markdown Execute this cell to simulate some data\n",
    "\n",
    "# Set random seed for reproducibility\n",
    "np.random.seed(1234)\n",
    "\n",
    "# Set parameters\n",
    "theta = [0, -2, -3]\n",
    "n_samples = 40\n",
    "\n",
    "# Draw x and calculate y\n",
    "n_regressors = len(theta)\n",
    "x0 = np.ones((n_samples, 1))\n",
    "x1 = np.random.uniform(-2, 2, (n_samples, 1))\n",
    "x2 = np.random.uniform(-2, 2, (n_samples, 1))\n",
    "X = np.hstack((x0, x1, x2))\n",
    "noise = np.random.randn(n_samples)\n",
    "y = X @ theta + noise\n",
    "\n",
    "\n",
    "ax = plt.subplot(projection='3d')\n",
    "ax.plot(X[:,1], X[:,2], y, '.')\n",
    "\n",
    "ax.set(\n",
    "    xlabel='$x_1$',\n",
    "    ylabel='$x_2$',\n",
    "    zlabel='y'\n",
    ")\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have our dataset, we want to find an optimal vector of paramters $\\boldsymbol{\\hat\\theta}$. Recall our analytic solution to minimizing MSE for a single regressor:\n",
    "\n",
    "\\begin{align}\n",
    "\\hat\\theta = \\frac{\\sum_{i=1}^N x_i y_i}{\\sum_{i=1}^N x_i^2}.\n",
    "\\end{align}\n",
    "\n",
    "The same holds true for the multiple regressor case, only now expressed in matrix form\n",
    "\n",
    "\\begin{align}\n",
    "\\boldsymbol{\\hat\\theta} = (\\mathbf{X}^\\top\\mathbf{X})^{-1}\\mathbf{X}^\\top\\mathbf{y}.\n",
    "\\end{align}\n",
    "\n",
    "This is called the [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) (OLS) estimator."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 1: Ordinary Least Squares Estimator\n",
    "\n",
    "In this exercise you will implement the OLS approach to estimating $\\boldsymbol{\\hat\\theta}$ from the design matrix $\\mathbf{X}$ and measurement vector $\\mathbf{y}$. You can use the `@` symbol for matrix multiplication, `.T` for transpose, and `np.linalg.inv` for matrix inversion.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def ordinary_least_squares(X, y):\n",
    "  \"\"\"Ordinary least squares estimator for linear regression.\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): design matrix of shape (n_samples, n_regressors)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "\n",
    "  Returns:\n",
    "    ndarray: estimated parameter values of shape (n_regressors)\n",
    "  \"\"\"\n",
    "  ######################################################################\n",
    "  ## TODO for students: solve for the optimal parameter vector using OLS\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: solve for theta_hat vector using OLS\")\n",
    "  ######################################################################\n",
    "\n",
    "  # Compute theta_hat using OLS\n",
    "  theta_hat = ...\n",
    "\n",
    "  return theta_hat\n",
    "\n",
    "\n",
    "# Uncomment below to test your function\n",
    "# theta_hat = ordinary_least_squares(X, y)\n",
    "# print(theta_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "text",
    "outputId": "8132759a-2b3d-4840-a38e-11390cc50118"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/master//tutorials/W1D3_ModelFitting/solutions/W1D3_Tutorial4_Solution_2b5abc38.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "After filling in this function, you should see that $\\hat{\\theta}$ = [ 0.13861386, -2.09395731, -3.16370742]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have our $\\mathbf{\\hat\\theta}$, we can obtain $\\mathbf{\\hat y}$ and thus our mean squared error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "outputId": "a39e5523-3cf0-45e4-ffd1-e3ff37496495"
   },
   "outputs": [],
   "source": [
    "# Compute predicted data\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "y_hat = X @ theta_hat\n",
    "\n",
    "# Compute MSE\n",
    "print(f\"MSE = {np.mean((y - y_hat)**2):.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Finally, the following code will plot a geometric visualization of the data points (blue) and fitted plane. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "93043093-ff27-4493-98fb-dde6a8294425"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "#@markdown Execute this cell to visualize data and predicted plane\n",
    "\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "xx, yy = np.mgrid[-2:2:50j, -2:2:50j]\n",
    "y_hat_grid = np.array([xx.flatten(), yy.flatten()]).T @ theta_hat[1:]\n",
    "y_hat_grid = y_hat_grid.reshape((50, 50))\n",
    "\n",
    "ax = plt.subplot(projection='3d')\n",
    "ax.plot(X[:, 1], X[:, 2], y, '.')\n",
    "ax.plot_surface(xx, yy, y_hat_grid, linewidth=0, alpha=0.5, color='C1',\n",
    "                cmap=plt.get_cmap('coolwarm'))\n",
    "\n",
    "for i in range(len(X)):\n",
    "  ax.plot((X[i, 1], X[i, 1]),\n",
    "          (X[i, 2], X[i, 2]),\n",
    "          (y[i], y_hat[i]),\n",
    "          'g-', alpha=.5)\n",
    "\n",
    "ax.set(\n",
    "    xlabel='$x_1$',\n",
    "    ylabel='$x_2$',\n",
    "    zlabel='y'\n",
    ")\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Section 2:  Polynomial Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "So far today, you learned how to predict outputs from inputs by fitting a linear regression model. We can now model all sort of relationships, including in neuroscience!  \n",
    "\n",
    "One potential problem with this approach is the simplicity of the model. Linear regression, as the name implies, can only capture a linear relationship between the inputs and outputs. Put another way, the predicted outputs are only a weighted sum of the inputs. What if there are more complicated computations happening? Luckily, many more complex models exist (and you will encounter many more over the next 3 weeks). One model that is still very simple to fit and understand, but captures more complex relationships, is **polynomial regression**, an extension of linear regression.\n",
    "\n",
    "Since polynomial regression is an extension of linear regression, everything you learned so far will come in handy now! The goal is the same: we want to predict the dependent variable $y_{n}$ given the input values $x_{n}$. The key change is the type of relationship between inputs and outputs that the model can capture. \n",
    "\n",
    "Linear regression models predict the outputs as a weighted sum of the inputs:\n",
    "\n",
    "$$y_{n}= \\theta_0 + \\theta x_{n} + \\epsilon_{n}$$\n",
    "\n",
    "With polynomial regression, we model the outputs as a polynomial equation based on the inputs. For example, we can model the outputs as:\n",
    "\n",
    "$$y_{n}= \\theta_0 + \\theta_1 x_{n} + \\theta_2 x_{n}^2 + \\theta_3 x_{n}^3 + \\epsilon_{n}$$\n",
    "\n",
    "We can change how complex a polynomial is fit by changing the order of the polynomial. The order of a polynomial refers to the highest power in the polynomial. The equation above is a third order polynomial because the highest value x is raised to is 3. We could add another term ($+ \\theta_4 x_{n}^4$) to model an order 4 polynomial and so on.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "First, we will simulate some data to practice fitting polynomial regression models. We will generate random inputs $x$ and then compute y according to $y = x^2 - x - 2 $, with some extra noise both in the input and the output to make the model fitting exercise closer to a real life situation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 431
    },
    "colab_type": "code",
    "outputId": "604ec72a-b420-4cde-d507-a980bbaa3405"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "#@markdown Execute this cell to simulate some data\n",
    "\n",
    "# setting a fixed seed to our random number generator ensures we will always\n",
    "# get the same psuedorandom number sequence\n",
    "np.random.seed(121)\n",
    "n_samples = 30\n",
    "x = np.random.uniform(-2, 2.5, n_samples)  # inputs uniformly sampled from [-2, 2.5)\n",
    "y =  x**2 - x - 2   # computing the outputs\n",
    "\n",
    "output_noise = 1/8 * np.random.randn(n_samples)\n",
    "y += output_noise  # adding some output noise\n",
    "\n",
    "input_noise = 1/2 * np.random.randn(n_samples)\n",
    "x += input_noise  # adding some input noise\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(x, y)  # produces a scatter plot\n",
    "ax.set(xlabel='x', ylabel='y');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Section 2.1: Design matrix for polynomial regression\n",
    "\n",
    "Now we have the basic idea of polynomial regression and some noisy data, let's begin! The key difference between fitting a linear regression model and a polynomial regression model lies in how we structure the input variables.  \n",
    "\n",
    "For linear regression, we used $X = x$ as the input data. To add a constant bias (a y-intercept in a 2-D plot), we use $X = \\big[ \\boldsymbol 1, x \\big]$, where $\\boldsymbol 1$ is a column of ones.  When fitting, we learn a weight for each column of this matrix. So we learn a weight that multiples with column 1 - in this case that column is all ones so we gain the bias parameter ($+ \\theta_0$). We also learn a weight for every column, or every feature of x, as we learned in Section 1.\n",
    "\n",
    "This matrix $X$ that we use for our inputs is known as a **design matrix**. We want to create our design matrix so we learn weights for $x^2, x^3,$ etc. Thus, we want to build our design matrix $X$ for polynomial regression of order $k$ as:\n",
    "\n",
    "$$X = \\big[ \\boldsymbol 1 , x^1, x^2 , \\ldots , x^k \\big],$$\n",
    "\n",
    "where $\\boldsymbol{1}$ is the vector the same length as $x$ consisting of of all ones, and $x^p$ is the vector or matrix $x$ with all elements raised to the power $p$. Note that $\\boldsymbol{1} = x^0$ and $x^1 = x$ "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 2: Structure design matrix\n",
    "\n",
    "Create a function (`make_design_matrix`) that structures the design matrix given the input data and the order of the polynomial you wish to fit. We will print part of this design matrix for our data and order 5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def make_design_matrix(x, order):\n",
    "  \"\"\"Create the design matrix of inputs for use in polynomial regression\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    order (scalar): polynomial regression order\n",
    "\n",
    "  Returns:\n",
    "    ndarray: design matrix for polynomial regression of shape (samples, order+1)\n",
    "  \"\"\"\n",
    "  ########################################################################\n",
    "  ## TODO for students: create the design matrix ##\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: create the design matrix\")\n",
    "  ########################################################################\n",
    "\n",
    "  # Broadcast to shape (n x 1) so dimensions work\n",
    "  if x.ndim == 1:\n",
    "    x = x[:, None]\n",
    "\n",
    "  #if x has more than one feature, we don't want multiple columns of ones so we assign\n",
    "  # x^0 here\n",
    "  design_matrix = np.ones((x.shape[0], 1))\n",
    "\n",
    "  # Loop through rest of degrees and stack columns (hint: np.hstack)\n",
    "  for degree in range(1, order + 1):\n",
    "      design_matrix = ...\n",
    "\n",
    "  return design_matrix\n",
    "\n",
    "\n",
    "# Uncomment to test your function\n",
    "order = 5\n",
    "# X_design = make_design_matrix(x, order)\n",
    "# print(X_design[0:2, 0:2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "text",
    "outputId": "f39aeb8b-d933-49f1-ad9f-3fbbab781871"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/master//tutorials/W1D3_ModelFitting/solutions/W1D3_Tutorial4_Solution_f8576b48.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "You should see that the printed section of this design matrix is `[[ 1.         -1.51194917]\n",
    " [ 1.         -0.35259945]]`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Section 2.2: Fitting polynomial regression models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have the inputs structured correctly in our design matrix, fitting a polynomial regression is the same as fitting a linear regression model! All of the polynomial structure we need to learn is contained in how the inputs are structured in the design matrix. We can use the same least squares solution we computed in previous exercises. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 3: Fitting polynomial regression models with different orders "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Here, we will fit polynomial regression models to find the regression coefficients ($\\theta_0, \\theta_1, \\theta_2,$ ...) by solving the least squares problem. Create a function `solve_poly_reg` that loops over different order polynomials (up to `max_order`), fits that model, and saves out the weights for each. You may invoke the `ordinary_least_squares` function. \n",
    "\n",
    "We will then qualitatively inspect the quality of our fits for each order by plotting the fitted polynomials on top of the data. In order to see smooth curves, we evaluate the fitted polynomials on a grid of $x$ values (ranging between the largest and smallest of the inputs present in the dataset)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def solve_poly_reg(x, y, max_order):\n",
    "  \"\"\"Fit a polynomial regression model for each order 0 through max_order.\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "    max_order (scalar): max order for polynomial fits\n",
    "\n",
    "  Returns:\n",
    "    dict: fitted weights for each polynomial model (dict key is order)\n",
    "  \"\"\"\n",
    "\n",
    "  # Create a dictionary with polynomial order as keys,\n",
    "  # and np array of theta_hat (weights) as the values\n",
    "  theta_hats = {}\n",
    "\n",
    "  # Loop over polynomial orders from 0 through max_order\n",
    "  for order in range(max_order + 1):\n",
    "\n",
    "    ##################################################################################\n",
    "    ## TODO for students: Create design matrix and fit polynomial model for this order\n",
    "    # Fill out function and remove\n",
    "    raise NotImplementedError(\"Student exercise: fit a polynomial model\")\n",
    "    ##################################################################################\n",
    "\n",
    "    # Create design matrix\n",
    "    X_design = ...\n",
    "\n",
    "    # Fit polynomial model\n",
    "    this_theta = ...\n",
    "\n",
    "    theta_hats[order] = this_theta\n",
    "\n",
    "  return theta_hats\n",
    "\n",
    "\n",
    "# Uncomment to test your function\n",
    "max_order = 5\n",
    "# theta_hats = solve_poly_reg(x, y, max_order)\n",
    "# plot_fitted_polynomials(x, y, theta_hats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 433
    },
    "colab_type": "text",
    "outputId": "426ccb6c-995c-4bc7-806c-6c3eb91c4bdd"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/master//tutorials/W1D3_ModelFitting/solutions/W1D3_Tutorial4_Solution_9b1eebd9.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=560 height=416 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/tutorials/W1D3_ModelFitting/static/W1D3_Tutorial4_Solution_9b1eebd9_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Section 2.4: Evaluating fit quality\n",
    "\n",
    "As with linear regression, we can compute mean squared error (MSE) to get a sense of how well the model fits the data. \n",
    "\n",
    "We compute MSE as:\n",
    "\n",
    "$$ MSE = \\frac 1 N ||y - \\hat y||^2 = \\frac 1 N \\sum_{i=1}^N (y_i - \\hat y_i)^2 $$\n",
    "\n",
    "where the predicted values for each model are given by $ \\hat y = X \\hat \\theta$. \n",
    "\n",
    "*Which model (i.e. which polynomial order) do you think will have the best MSE?*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 4: Compute MSE and compare models\n",
    "\n",
    "We will compare the MSE for different polynomial orders with a bar plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 483
    },
    "colab_type": "code",
    "outputId": "e6505809-8238-499d-821f-cd1c287b78e9"
   },
   "outputs": [],
   "source": [
    "mse_list = []\n",
    "order_list = list(range(max_order + 1))\n",
    "\n",
    "for order in order_list:\n",
    "\n",
    "  X_design = make_design_matrix(x, order)\n",
    "\n",
    "  ############################################################################\n",
    "  # TODO for students: Compute MSE (fill in ... sections)\n",
    "  #############################################################################\n",
    "\n",
    "  # Get prediction for the polynomial regression model of this order\n",
    "  y_hat = ...\n",
    "\n",
    "  # Compute the residuals\n",
    "  residuals = ...\n",
    "\n",
    "  # Compute the MSE\n",
    "  mse = ...\n",
    "\n",
    "  mse_list.append(mse)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "# Uncomment once above exercise is complete\n",
    "# ax.bar(order_list, mse_list)\n",
    "\n",
    "ax.set(title='Comparing Polynomial Fits', xlabel='Polynomial order', ylabel='MSE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "text",
    "outputId": "3fffc612-098e-409f-daa5-5439e9d1b613"
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/master//tutorials/W1D3_ModelFitting/solutions/W1D3_Tutorial4_Solution_ccea5dd7.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=558 height=414 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/tutorials/W1D3_ModelFitting/static/W1D3_Tutorial4_Solution_ccea5dd7_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "* Linear regression generalizes naturally to multiple dimensions\n",
    "* Linear algebra affords us the mathematical tools to reason and solve such problems beyond the two dimensional case\n",
    "\n",
    "* To change from a linear regression model to a polynomial regression model, we only have to change how the input data is structured\n",
    "\n",
    "* We can choose the complexity of the model by changing the order of the polynomial model fit\n",
    "\n",
    "* Higher order polynomial models tend to have lower MSE on the data they're fit with\n",
    "\n",
    "**Note**: In practice, multidimensional least squares problems can be solved very efficiently (thanks to numerical routines such as LAPACK).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Suggested readings**  \n",
    "\n",
    "[Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares](http://vmls-book.stanford.edu/)\n",
    "Stephen Boyd and Lieven Vandenberghe"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W1D3_Tutorial4",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
